Optimized Inflated 3D Convolutional Neural Networks for Robust Human Action Recognition in Surveillance Videos

Author Details

Naga Charan Nandigama

Journal Details

Published

Published: 13 August 2019 | Article Type : Research Article

Abstract

Human action recognition in video surveillance remains a challenging task in computer vision, particularly  when dealing with long-duration activities, viewpoint variations, and crowded scenes. This paper presents an enhanced Optimized Inflated 3D Convolutional Neural Network (Opt-3D-Inflated-CNN) architecture designed specifically for accurate and efficient temporal-spatial feature extraction from surveillance video sequences. The proposed approach leverages 2D-to-3D filter inflation techniques combined with parallel branch architecture and temporal fusion mechanisms to capture both local motion patterns and global spatio-temporal dynamics. Comprehensive evaluation on two benchmark datasets—UCF101 (101 action categories) and HAR (6 action classes)—demonstrates state-of-the-art performance with 97.8% accuracy on UCF101 and 94.75% accuracy on HAR dataset, representing improvements of 8.2% and 10.89% over baseline 3D-CNN models respectively. The system achieves real-time processing capability with optimized
computational efficiency suitable for edge deployment in surveillance systems.

Keywords: 3D Convolutional Neural Networks, Action Recognition, Temporal-Spatial Feature Learning, Video Surveillance, Deep Learning, Inflated Convolutions, Motion Feature Extraction, Multi-branch Architecture.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright © Author(s) retain the copyright of this article.

Statistics

27 Views

39 Downloads

Volume & Issue

Article Type

Research Article

How to Cite

Citation:

Naga Charan Nandigama. (2019-08-13). "Optimized Inflated 3D Convolutional Neural Networks for Robust Human Action Recognition in Surveillance Videos." *Volume 3*, 2, 48-57